New High-Speed and Low-Power Radix-2 Multiplication Algorithms

نویسندگان

  • A. K. Oudjida
  • A. Liacha
  • M. L. Berrandjia
  • N. Chaillet
چکیده

In this paper, a new recursive multibit recoding multiplication algorithm is introduced. It provides a general space-time partitioning of the multiplication problem that not only enables a drastic reduction of the number of partial products (N/r), but also eliminates the need of pre-computing odd multiples of the multiplicand in higher radix (r≥3) multiplication. Based on a mathematical proof that any higher radix-2 can be recursively derived from a combination of two or a number of lower radices, a series of generalized radix-2 multipliers are generated by means of primary radices: 2 , 2, 2, and 2. A variety of higher-radix (2-2) two’s complement 64x64 bit serial/parallel multipliers are implemented on Virtex-6 FPGA and characterized in terms of multiply-time, energy consumption per multiply-operation, and area occupation for r value varying from 2 to 64. Compared to a recent published algorithm, savings of 21%, 53%, 105% are respectively obtained in terms of speed, power, and area. Keywords—High-Radix Multiplication; Low-Power Multiplication; Multibit Recoding Multiplication; Partial Product Generator (PPG) I. BACKGROUND AND MOTIVATION The continuous refinement of the mostly-used design paradigm based on modified Booth algorithm [1] combined to a reduction tree (carry-save-adder array , Dadda,...) has reached saturation. In [2] only slight improvements are achieved. The proposal reduces the partial product number from N/2+1 to N/2 using different circuit optimization techniques of the critical path. Theoretically, only the signed multibit recoding multiplication algorithm [3] is capable of a drastic reduction (N/r) of the partial product number, given that r+1 is the number of bits of the multiplier that are simultaneously treated (1≤r≤N). Unfortunately, this algorithm requires the precomputation of a number of odd multiples of the multiplicand (until (2-1).X) that scales linearly with r. The large number of odd multiples not only requires a considerable amount of multiplexers to perform the necessary complex recoding into PPG, but dramatically increases the routing density as well. Therefore, a reverse effect occurs that offsets speed and power benefits of the compression factor (N/r). This is the main reason why the multibit recoding algorithm was abandoned. In practice, designs do not exceed r=3 (radix-8). The current trend [4][5] relies upon advanced arithmetic to determine minimal number bases that are representatives of the digits resulting from larger multibit recoding. The objective is to eliminate information redundancy inside r+1 bit-length slices for a more compact PPG. This is achievable as long as no or just very few odd multiples are required. In [4], Seidel et al. have introduced a secondary recoding of digits issued from an initial multibit recoding for 5≤r≤16. The recoding scheme is based on balanced complete residue system. Though it significantly reduces the number of partial products (N/r for 5≤r≤ 16), it requires some odd multiples for r≥8. While in [5], Dimitrov et al. have proposed a new recoding scheme based on double base number system for 6≤r≤11. The algorithm is limited to unsigned multiplication and requires a larger number of odd multiples. Instead of looking for more effective number bases, which is a hard mathematical task, our approach consists in exploiting already existing odd-multiple free recoding algorithms (2, 2, 2, and 2) to recursively build up generalized oddmultiple free radix-2 recoding schemes. To achieve such a goal, the multibit recoding multiplication algorithm is revisited [3]. Its design space is extended by the introduction of a new recursive version that enables a hardware-friendly space-time partitioning of the multiplication problem. Depending on r value ranging from 2 to N, highlyscalable signed multipliers with various levels of parallelism and latencies can be systematically generated with insignificant control-complexity. The new algorithm has also the merit to recursively reduce the number of partial products (N/r) without any limit for the parameter r and any need for the odd multiples of the multiplicand. It also allows the combination of different recoding schemes proposed in the literature into the same architecture for better performances of the multiplier. Several higher radix (2-2) two’s complement 64x64 bit serial/parallel multipliers based on combined recoding schemes are implemented on Virtex-6 FPGA and characterized in terms of speed, power, and area occupation for r value ranging from 2 to 64. Compared to a new signed version of Dimitrov et al. algorithm [5] and Seidel et al. algorithm [4], outstanding results are obtained with the new multibit recoding scheme for r=8 formed by the combination of Seidel algorithm (r=5), MacSorley algorithm (r=2) [1] and Booth algorithm (r=1) [6]. This work is supported by “Centre de Développement des Technologies Avancées” (CDTA), Algiers, Algeria, in collaboration with FEMTO-ST Institute, Besançon, France. ha l-0 08 72 31 0, v er si on 1 11 O ct 2 01 3 Author manuscript, published in " "

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High speed Radix-4 Booth scheme in CNTFET technology for high performance parallel multipliers

A novel and robust scheme for radix-4 Booth scheme implemented in Carbon Nanotube Field-Effect Transistor (CNTFET) technology has been presented in this paper. The main advantage of the proposed scheme is its improved speed performance compared with previous designs. With the help of modifications applied to the encoder section using Pass Transistor Logic (PTL), the corresponding capacitances o...

متن کامل

Multiplier-accumulator Using Radix-2 Modified Booth Algorithm and Spst Adder Using Verilog

In this paper, we propose a new multiplier-and-accumulator (MAC) architecture for low power and high speed arithmetic. High speed and low power MAC units are required for applications of digital signal processing like Fast Fourier Transform, Finite Impulse Response filters, convolution etc. For improving the speed and reducing the dynamic power, there is a need to reduce the glitches (1 to 0 tr...

متن کامل

A New Low Power 32×32-bit Multiplier

Multipliers are one of the most important building blocks in processors. This paper describes a low-power 32×32-bit parallel multiplier, designed and fabricated using a 0.13 μm double-metal doublepoly CMOS process. In order to achieve low-power operation, the multiplier was designed utilizing mainly pass-transistor logic circuits, without significantly compromising the speed performance of the ...

متن کامل

New High-Speed and Low-Power radix-2r multiplication algorithms

In this paper, a new recursive multibit recoding multiplication algorithm is introduced. It provides a general space-time partitioning of the multiplication problem that not only enables a drastic reduction of the number of partial products (N/r), but also eliminates the need of pre-computing odd multiples of the multiplicand in higher radix (r≥3) multiplication. Based on a mathematical proof t...

متن کامل

Low Power and Small Area Implementation for OFDM Applications

This paper proposes that several FFT algorithms such as radix2, radix-4 and split radix were designed using VHDL with the multiplication complexity reduced more than 30% by using the newly proposed CSD constant multipliers instead of the programmable multipliers and the simulations of standard 0.35 μm. The sizes of FFT/IFFT operations are varied in different applications of OFDM systems. The re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013